Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fallback to use the CUDA primary context #189

Merged
merged 18 commits into from
Apr 3, 2023

Conversation

madsbk
Copy link
Member

@madsbk madsbk commented Mar 29, 2023

@madsbk madsbk added bug Something isn't working non-breaking Introduces a non-breaking change labels Mar 29, 2023
@madsbk madsbk marked this pull request as ready for review March 29, 2023 12:18
@madsbk madsbk requested a review from wence- March 29, 2023 12:18
Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we work through the context-getting logic, I am not sure it is right?

cpp/include/kvikio/utils.hpp Outdated Show resolved Hide resolved
cpp/include/kvikio/utils.hpp Show resolved Hide resolved
Comment on lines 114 to 119
if (err != CUDA_ERROR_INVALID_VALUE) {
// If this isn't the case, we grab the current context
CUDA_DRIVER_TRY(cudaAPI::instance().CtxGetCurrent(&ctx));
} else {
CUDA_DRIVER_TRY(err); // Check for errors from previous asynchronous launches
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this logic is wrong? Let's try and work through it:

The options for the allocation are:

  1. stream ordered: CU_POINTER_ATTRIBUTE_CONTEXT will return CUDA_SUCCESS and put ctx = nullptr
  2. not stream ordered: CU_POINTER_ATTRIBUTE_CONTEXT will return CUDA_SUCCESS and put ctx = some_context
  3. not a UVM allocation: CU_POINTER_ATTRIBUTE_CONTEXT will return CUDA_ERROR_INVALID_VALUE
  4. Some other thing: some other cuda error

So if we hit case 3 or 4, we want to error, if we hit case 2, we want to return the context that the query returned, if we hit case 1 we need to do some more work: I think we can just use CU_POINTER_ATTRIBUTE_DEVICE_POINTER query and then do something based on that (either return a non-null current context or else build a primary context and return that).

So perhaps:

CUcontext ctx;
// either the pointer is valid (CUDA_SUCCESS), or it isn't (some other error)
CUDA_DRIVER_TRY(cudaAPI::instance().PointerGetAttribute(&ctx, CU_POINTER_ATTRIBUTE_CONTEXT, dev_ptr));
if (ctx != nullptr) {
    // not a stream-ordered allocation, this context is fine.
    return ctx;
}
// stream-ordered allocation, let's see if the current device context exists and we can use it
CUDA_DRIVER_TRY(cudaAPI::instance().CtxGetCurrent(&ctx));
if (ctx != nullptr) {
    CUdeviceptr target{};
    auto const err = cudaAPI::instance().PointerGetAttribute(&target, CU_POINTER_ATTRIBUTE_DEVICE_POINTER, dev_ptr);
    if (err == CUDA_SUCCESS && target == dev_ptr) return ctx;
}
// Either no context, or the current context can't access the device pointer?
int ord = get_device_ordinal_from_pointer(devPtr);
_primary_contexts.try_emplate(ord, ord);
return _primary_contexts.at(ord).ctx;

WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I messed up the equality checks and we don’t need to check the accessibility of devPtr when CU_POINTER_ATTRIBUTE_CONTEXT succeed. However, we must survive a CUDA_ERROR_INVALID_VALUE error.

cpp/include/kvikio/utils.hpp Outdated Show resolved Hide resolved
cpp/include/kvikio/utils.hpp Outdated Show resolved Hide resolved
@madsbk madsbk requested a review from wence- March 29, 2023 14:49
CUcontext ctx;
const CUresult err =
cudaAPI::instance().PointerGetAttribute(&ctx, CU_POINTER_ATTRIBUTE_CONTEXT, dev_ptr);
if (err != CUDA_ERROR_INVALID_VALUE) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we fall through if the result is CUDA_ERROR_INVALID_VALUE? Ah, I guess if we don't have UVM, this might still be a device pointer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to crash when getting CUDA_ERROR_INVALID_VALUE so we cannot use CUDA_DRIVER_TRY(). Instead, we want to get the current context (if any).

CUdeviceptr current_ctx_dev_ptr{};
CUDA_DRIVER_TRY(cudaAPI::instance().PointerGetAttribute(
&current_ctx_dev_ptr, CU_POINTER_ATTRIBUTE_DEVICE_POINTER, dev_ptr));
if (current_ctx_dev_ptr != dev_ptr) { return ctx; }
Copy link
Contributor

@wence- wence- Mar 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I don't understand this check. https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__UNIFIED.html#group__CUDA__UNIFIED_1g0c28ed0aff848042bc0533110e45820c says:

Returns in *data the device pointer value through which ptr may be accessed by kernels running in the current CUcontext. The type of data must be CUdeviceptr *.

If there exists no device pointer value through which kernels running in the current CUcontext may access ptr then CUDA_ERROR_INVALID_VALUE is returned.

If there is no current CUcontext then CUDA_ERROR_INVALID_CONTEXT is returned.

Except in the exceptional disjoint addressing cases discussed below, the value returned in *data will equal the input value ptr.

So my reading is that if the driver call was successful, then current_ctx_dev_ptr will either be the same as dev_ptr (usual case) or different (but still valid) in the corner cases where dev_ptr was allocated either by cuMemHostRegister or cuMemHostAlloc with CU_MEMHOSTALLOC_WRITECOMBINED.

In either case, the current context is good for accessing the pointer.

So I think:

Suggested change
if (current_ctx_dev_ptr != dev_ptr) { return ctx; }
if (current_ctx_dev_ptr == dev_ptr) { return ctx; }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If dev_ptr != current_ctx_dev_ptr, we cannot access dev_ptr using the current context right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it means that the memory was allocated in one of the special ways listed above. The context is capable of accessing the data, but must do so through current_ctx_dev_ptr (not dev_ptr). Since this function is only asking "what context is good for accessing this pointer?" I think we should always return the ctx here.

Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still one quibble about the case where we have a context and have queried the device pointer attribute.

Co-authored-by: Lawrence Mitchell <[email protected]>
@madsbk madsbk marked this pull request as draft March 29, 2023 16:12
@madsbk
Copy link
Member Author

madsbk commented Mar 29, 2023

Marked the PR as a draft again. I will write some tests tomorrow before we merge

Co-authored-by: Lawrence Mitchell <[email protected]>
CUDA_DRIVER_TRY(cudaAPI::instance().CtxGetCurrent(&ctx));
if (ctx != nullptr) {
CUdeviceptr current_ctx_dev_ptr{};
CUDA_DRIVER_TRY(cudaAPI::instance().PointerGetAttribute(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have CUDA_ERROR_INVALID_VALUE here (the current context is not set up for allowing us to access this pointer). Is that a recoverable error?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should certainly handle other errors as failure.

// Finally, if we didn't find any usable context, we return the primary context of the
// device that owns `devPtr`. Notice, we use `_primary_contexts` to cache the primary
// context of each device.
int ordinal = get_device_ordinal_from_pointer(dev_ptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If dev_ptr was allocated with cuMemHostAlloc with CU_MEMHOSTALLOC_WRITECOMBINED, and is the "host address". It looks like a UVM address, but it is unmapped on the device, so cannot be used in a kernel, even with this primary context that we just made. In that case, the pointer that is usable on the device is the one obtained from querying CUDA_POINTER_ATTRIBUTE_DEVICE_POINTER.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After setting the primary context, I think if we go round and query CUDA_POINTER_ATTRIBUTE_DEVICE_POINTER again and find that the result is different from the input, then we get an error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should try and clarify some more with the cuda team.

Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, just some minor debugging leftovers to clean up I think

cpp/include/kvikio/utils.hpp Outdated Show resolved Hide resolved
cpp/include/kvikio/utils.hpp Outdated Show resolved Hide resolved
cpp/include/kvikio/utils.hpp Outdated Show resolved Hide resolved
cpp/include/kvikio/utils.hpp Outdated Show resolved Hide resolved
cpp/include/kvikio/utils.hpp Outdated Show resolved Hide resolved
cpp/include/kvikio/utils.hpp Outdated Show resolved Hide resolved
python/tests/test_basic_io.py Outdated Show resolved Hide resolved
cpp/include/kvikio/utils.hpp Show resolved Hide resolved
@madsbk madsbk marked this pull request as ready for review March 30, 2023 12:21
@madsbk
Copy link
Member Author

madsbk commented Mar 30, 2023

Thanks for the help @wence- !

@madsbk madsbk requested a review from wence- March 30, 2023 14:01
Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @madsbk! How shall we test that this works for the initial reported cudf bug?

@madsbk
Copy link
Member Author

madsbk commented Mar 31, 2023

How shall we test that this works for the initial reported cudf bug?

In test_no_current_cuda_context(), we test the underlying problem of the reported cudf+dask bug.
But it might be too expensive in terms of CI resources to test the exact cudf code.
Since KvikIO is header-only, CI would have to build cudf every time.

@wence-
Copy link
Contributor

wence- commented Mar 31, 2023

How shall we test that this works for the initial reported cudf bug?

In test_no_current_cuda_context(), we test the underlying problem of the reported cudf+dask bug. But it might be too expensive in terms of CI resources to test the exact cudf code. Since KvikIO is header-only, CI would have to build cudf every time.

I was going to just build locally and check?

@madsbk
Copy link
Member Author

madsbk commented Mar 31, 2023

I was going to just build locally and check?

Ahh sure, I have done that continually :)

To built cudf using this branch, add the KvikIO source in the build.sh script like:

diff --git a/build.sh b/build.sh
index bee66d819b..b6bcf5ce75 100755
--- a/build.sh
+++ b/build.sh
@@ -288,6 +288,7 @@ if buildAll || hasArg libcudf; then
           -DDISABLE_DEPRECATION_WARNINGS=${BUILD_DISABLE_DEPRECATION_WARNINGS} \
           -DCUDF_USE_PER_THREAD_DEFAULT_STREAM=${BUILD_PER_THREAD_DEFAULT_STREAM} \
           -DCMAKE_BUILD_TYPE=${BUILD_TYPE} \
+          -DCPM_KvikIO_SOURCE=/home/mkristensen/repos/kvikio \
           ${EXTRA_CMAKE_ARGS}
 
     cd ${LIB_BUILD_DIR}

@wence-
Copy link
Contributor

wence- commented Apr 3, 2023

Can confirm this fixes the reported cudf issue

@wence-
Copy link
Contributor

wence- commented Apr 3, 2023

/merge

@rapids-bot rapids-bot bot merged commit 5978055 into rapidsai:branch-23.04 Apr 3, 2023
@madsbk madsbk deleted the use_cuda_primary_context branch April 11, 2023 06:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working non-breaking Introduces a non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] dd.from_map with cudf read_json not working when pool is enabled
2 participants